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Abstract 


A three-round algorithm is presented that guarantees agreement in a system of K > 3F+1 
nodes provided each faulty node induces no more than F faults and each good node experiences 
no more than F faults, where, F is the maximum number of simultaneous faults in the network. 
The algorithm is based on the Oral Message algorithm of Lamport et al. and is scalable with 
respect to the number of nodes in the system and applies equally to the traditional node-fault 
model as well as the link-fault model. We also present a mechanical verification of the 
algorithm focusing on verifying the correctness of a bounded model of the algorithm as well as 
confirming claims of determinism. 

Keywords: Oral Message, Agreement, Byzantine, fault tolerant, synchronization, distributed, 
model checking 
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1. Introduction 


Distributed systems have become an integral part of safety-critical computing applications, 
necessitating system designs that incorporate complex fault-tolerant resource management 
functions to provide globally coordinated operations with ultra-reliability. As a result, robust 
clock synchronization has become a required fundamental component of fault-tolerant safety- 
critical distributed systems. Synchronization has practical significance as a fundamental service 
for higher-level algorithms that solve other problems. For example, in safety-critical TDMA 
(Time Division Multiple Access) architectures [1, 2, 3], synchronization is the most crucial 
element of these systems. Typically, the assumed topology is a regular graph such as a fully 
connected graph or a ring since they provide a base case to solve the distributed synchronization 
problem. 

A fundamental property of a robust distributed system is the capability of tolerating and 
potentially recovering from failures (loss of service due to a fault) that are not predictable in 
advance. A fault is a defect or flaw in a system component resulting in an incorrect state [2, 4]. 
In the context of fault-tolerant distributed systems, a fault presenting different symptoms to 
different observers is known as a Byzantine (arbitrary) fault. We assume that there are a 
maximum of F simultaneous faults in the network. The requirement to handle faults adds a new 
dimension to the complexity of the synchronizing distributed systems. 

We call an approach to solving the clock synchronization problem direct if it relies solely on 
local (node level) detection and filtering of faults. This approach is primarily limited to detecting 
timing and/or value faults of a node’s incoming messages. In contrast, we call an approach 
indirect if it relies on the network level detection and filtering of faults independent of, and in 
addition to, the local detection and filtering of the faults. This approach however requires 
coordination at the network level. 

Thus far, there is no verifiable solution for the general case of the clock synchronization 
problem, where the topology is arbitrary and any number of various types of faults are tolerated. 
Furthennore, most attempts have been in trying to solve this problem directly, although there are 
some approaches to solve this problem indirectly using authenticated (signed) messages [5]. 
Driscoll et al. in [6] however argues that “while the arguments of unforgeable signed messages 
make sense in the context of communicating generals, the validity of necessary assumptions in a 
digital processing environment is not supportable. In fact, the philosophical approach of 
utilizing cryptography to address the problem within the real world of digital electronics makes 
little sense. The assumptions required to support the validity of unbreakable signatures are 
equally applicable to simpler approaches (such as appending a simple source ID or a CRC to the 
end of a message). It is not possible to prove such assumptions analytically for systems with 
failure probability requirements near 10' 9 /hr.” Furthermore, addressing network element 
imperfections, such as oscillator drift with respect to real time and differences in the lengths of 
the physical communication media, is necessary to make a solution applicable to realizable 
systems. 

The main issue in solving the clock synchronization problem is a lack of symmetric view 
(agreement) in the system at the participating good nodes in the sense that two good nodes may 
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disagree on the message sent. However, there are a number of ways of achieving message 
symmetry across the system. In [5, 7] various ideas for overcoming failures in a robust 
distributed system are addressed that include tolerating Byzantine faults. In solving the 
consensus problem, which is the ability of a set of nodes to agree on a single value despite 
failures, Schmid et al. argue in [8] that: “A fully-fledged n-process consensus algorithm is 
obtained by using a separate instance of a Byzantine agreement algorithm (with n-1 receivers) 
for disseminating any process’s local value, and using a suitable choice function (majority) for 
the consensus result*.” The consensus problem, and hence, the proposed idea by Schmid, is 
based on inherent assumption of synchrony among the good nodes, and so is not applicable to 
solving the clock synchronization problem. 

Other methods include using variety of engineering practices, e.g., using a self-checking pair at 
the node level [9, 10] or central guardians at the system level [11, 12]. However, as Driscoll et 
al. reported in [6], correctness of claims of these approaches may not be verifiable. Furthennore, 
we believe that to be generally useful, algorithms that guarantee agreement must be able to 
handle non-authenticated messages. Thus, the crux of our idea, as proposed in [13], is to solve 
this problem indirectly by first converting any message to a symmetric message, and then use a 
verified protocol based on the symmetry assumption to solve the synchronization problem. 

The Oral Message (OM) algorithm of Lamport et al. [5] that solves the Byzantine Agreement 
(BA) problem [14] is also an indirect approach, and is meant to reliably transform a message 
from a single source to a symmetric message (an agreement). The OM algorithm has been 
proven to reach agreement at the network level for a given source [5, 14, 15] and does not 
require initial synchrony among the good nodes. The OM requires F+l rounds of exchanges 
and, with a message complexity of 0(K F ), the number of exchanged messages grows 
exponentially as F grows linearly. Therefore, the use of the OM algorithm for F > 2 is very 
costly and impractical. 

In this paper, we present an alternative for achieving agreement, hereafter referred to as 3ROM 
(3 Rounds using OM) algorithm that is based on the OM algorithm. The 3ROM assumes each 
node Ni, i = 1 ..K, either induces up to F faults if it is a faulty node, or experiences no more than F 
faults if it is a good node. It further assumes that the maximum number of simultaneous faults in 
the network is limited to F. We indicate the number of faults associated with Ni by fi, thus, 
F = and fi < F. The 3 ROM algorithm is independent of the fault model (node-fault or link- 
fault model), and as the name implies, achieves agreement in three rounds. Thus, it is 
independent of the number of faults (in terms of number of required rounds, not the amount of 
messages). The algorithm has a message complexity of 0(K 3 ), and is also scalable with respect 
to K. We also present the model checking results of a bounded model of the algorithm to verify 
its correctness. 

This paper is organized as follows. We describe the fault models in Section 2. In Section 3 we 
provide a system overview. We present the 3ROM algorithm and its formal proof in Section 4. 
In Section 5, we present the model checking efforts toward verification of correctness of a 
bounded model of the algorithm and the results of that effort. Finally, we present concluding 
remarks in Section 6. 


Since we use N, to address a node, we use K here instead of n as is traditionally used in the literature. 
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2. Fault Models 


In synchronous message-based distributed systems, a fault is typically defined as a message that 
was not transmitted when it was expected or a message that was transmitted but not received or 
received but not accepted, i.e., deemed invalid by a receiver. Thus, the fault is either associated 
with the source node of the message, the corresponding link between the source node and the 
destination node, or the destination node. Consequently, there are two viewpoints, node-centric 
and link-centric, and thus, there are two ways of modeling faults. In the node-centric model, 
which we refer to as the node-fault model, the faults are associated with the source node of the 
message and all fault manifestations between the source and the destination nodes for the 
messages from that source count as a single fault, which is specially the case when the faults are 
associated with a Byzantine faulty node [5, 6, 16, 17]. In this model all links are assumed to be 
good. Miner et cd. [16] for instance, model the absence of a link as a link fault and even though 
both nodes and links failures are considered, they abstractly model link failures as failures of the 
source node. 

In the link-centric model, which we refer to as the link-fault model, a fault is associated with the 
communication means connecting the source node to the destination node. In this model, all 
nodes are assumed to be good and an invalid message at the receiving node is counted as a single 
fault for the corresponding input link. Thus, from the global perspective, a Byzantine faulty 
node manifests as one or more link failures. 

A link- fault model introduced by Schmid et cd. [18] is called perception-based hybrid fault 
model, where faults are viewed from the perspective of the receiving nodes. Faults are 
associated with their input links, and all nodes are assumed to be good. They argued that since F 
faulty nodes can produce at most F faulty perceptions in any node, the link-fault model is 
compatible with the traditional node-fault model and so, all existing lower bound and 
impossibility results remain valid. 

“In the perception-based model, the system-wide number of faults is replaced by the number of 
faults that are observable in the nodes’ local ‘perceptions’ of the system. Formally, node r’s 
perception vector V r = ( V / , V ?,..., V, K ,) is considered, where every perception V r s £ V r represents 
the message node r received from node s in some specific round; type and value(s) depend upon 
the particular algorithm considered” [18]. In that paper, Schmid et cd. present a solution for 
synchronous detenninistic consensus problem, where all nodes are expected to achieve 
agreement on a single value, in synchronous distributed systems with link faults. 


3. System Overview 

We considered “synchronous” message -passing distributed systems and modeled the system as a 
graph with a set of nodes (vertices) that communicate with each other by sending messages via a 
set of communication links (edges) that represent the nodes’ interconnectivity. The underlying 
topology considered is a fully connected network of K nodes that exchange messages through a 
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set of communication li nk s. We leave the generalization to other topologies to future works. 
The system consists of a set of good nodes and a set of faulty nodes. A good node is assumed to 
be an active participant and correctly execute the algorithms. A faulty node is either benign 
(detectably bad), symmetrically faulty, or arbitrarily (Byzantine) faulty. However, in this paper 
our primary focus is Byzantine faults. 

The communication li nk s are point-to-point and unidirectional, each connecting a source to a 
destination node. Thus, the fully connected graph consists of K(K- 1) unidirectional li nk s. A 
good link is assumed to correctly deliver a message from its source node to its destination node 
within a bounded communication delay time. A faulty link does not deliver the message, 
delivers a corrupted message, or delivers a message outside the expected communication delay 
time. 

The nodes communicate with each other by exchanging broadcast messages. Broadcast of a 
message by a node is realized by transmitting the message, at the same time, to all nodes that are 
directly connected to it. The communication network does not guarantee any relative order of 
arrival of a broadcast message at the receiving nodes, that is, a consistent delivery order of a set 
of messages does not necessarily reflect the temporal or causal order of the message 
transmissions [1]. A maximum of F faults are assumed to be present in the system, where F > 0. 
We assume K>3F+l and define the minimum number of good nodes in the system, G, by 
G = K-F nodes. The minimum number of nodes needed to maintain synchrony is well 
established to be 3F+1 [6, 14, 19]. 

3.1. Communication Delay 

The communication delay between directly connected (adjacent) nodes is expressed in tenns of 
the minimum event-response delay, D , and network imprecision, cl. These parameter are 
measured at the network level. A message broadcast by a node at real time t is expected to arrive 
at its directly connected adjacent nodes, be processed, and subsequent messages to be generated 
by those nodes within the time interval [ t+D , t+D+d\. Communication between independently- 
clocked nodes is inherently imprecise. The network imprecision, d, is the maximum time 
difference among all receivers of a message from a transmitting node with respect to real time. 
The imprecision is due to many factors including, but not limited to, the drift of the oscillators 
with respect to real time, jitter, discretization error, temperature effects and differences in the 
lengths of the physical communication media. These parameters are assumed to be bounded, D 
> 0, cl > 0, and both have units of real-time clock ticks and their values known in the network. 
The communication delay, denoted y, is expressed in terms of D and d, is defined as y = D+cl, 
and has units of real-time clock ticks. In other words, we assume synchronous communication 
and bound the communication delay between any two directly connected adjacent nodes by 
[ D , yj. However, for simplicity of notation, in the remainder of this paper we assume that the 
messages arrive logically at the same time at the destination nodes. 

3.2. The Sync Message And Its Validity 

In order to achieve and maintain desired synchrony, the nodes communicate by exchanging Sync 
messages, where synchrony is defined as a measure of the relative imprecision of the good 
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nodes. A Sync message from a given source is valid if it arrives at or after one D of an 
immediately preceding Sync message from that source, that is, the message validity in the value 
domain, i.e., valid Sync messages are rate-constrained. Assuming physical-layer error detection 
is dealt with separately, the reception of a Sync message is indicative of its validity in the value 
and time domains. 


4. 3ROM 

In a synchronous distributed system, the Oral Message algorithm of Lamport et al. [5] solves the 
Byzantine Agreement (BA) problem [14] by reliably transfonning a message, in the presence of 
faults, to a symmetric message at the network level, whereby the good nodes reach an agreement 
and collectively either accept or reject the message. The OM algorithm is recursive and every 
iteration of the execution of the algorithm constitutes a step (round) of exchange of messages by 
the nodes. An instance of the OM algorithm starts with the source node broadcasting a message, 
the first round, followed by other nodes (except the source of the message) recursively 
rebroadcasting (relaying) the messages they receive to others in subsequent rounds. For a fully 
connected graph of K nodes, the OM algorithm requires F+l communication rounds. At the end 
of the F+l rounds, the nodes vote and reach agreement. 

In this section we present a three-round algorithm, similar to the OM algorithm, that achieves 
agreement among the good nodes, independent of F, provided that K > 3F+1 nodes, where 
K is the number of nodes, F = X//, for i = \..K, where/ is the number of faults associated with 
Ni. For the node-fault model, the faulty nodes act arbitrary provided their behavior is bounded 
by the assumption, i.e., at any round a faulty node broadcasts a valid message to at least K-F 
other nodes. Similarly, for the link-fault model, at any round no more than F link faults are 
perceived at a receiving good node. We will also show that this three-round algorithm applies 
equally to the node-fault and link- fault models. To simplify presentation of this algorithm, we 
assume broadcast messages arrive logically at the same time at their destination good nodes. We 
later remove this simplifying assumption and show that agreement is reached at the good nodes 
within a time bound. We use two types of messages: Sync and Relay, and extend the message 
validity argument of Section 3.2 to both messages. 

The 3 ROM Algorithm 

The algorithm depends on two positive parameters a and /?, which are used in determining the 
final acceptance or rejection of a Sync message from a source node. Hence, the algorithm 
described is actually 3ROM(a, [3). The algorithm consists of three rounds and a vote. 

Round 1 - The source node broadcasts a Sync message to all other nodes, effectively saying 
“I’m here.'” A node does not physically send a message to itself even though it uses its 
own message. The nodes that receive the message record that the message was received. 

Round 2 - All good nodes that received a Sync message in Round 1 broadcast a Relay 
message to all other nodes, essentially saying “ I’ve got a message.'” The good nodes that 
do not receive the Sync message do nothing. Note that since the source node uses own 
message, if it is a good node, it too participates in this round. 
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Round 3 - All good nodes that received at least a messages (Sync or Relay) in Round 2 
broadcast a vector of K messages containing what they have received from all other 
nodes in Rounds 1 and 2; effectively saying “This is what I’ve received from others.’’'’ 
Note that, if the algorithm assumptions hold, all good nodes participate in this round. 

Vote - At the end of Round 3, each node locally constructs a KxK network-level matrix M of 
received messages, where entries M(i, j) = { 5 , r, 0}, i, j = 1..K, where V indicates having 
received a Sync message, V indicates having received a Relay message and ‘0’ for not 
receiving any messages, i.e., a fault. A column cj of the matrix M corresponds to the 
messages perceived as transmitted by Nj and a row r, of the matrix M corresponds to the 
messages received by Ni. Let, for i, j = 1..K, X, = 1 if £ cj > a and X, = 0, otherwise, 
where fcj is the sum of the non-zero entries in column c h i.e., treating V and V entries 
equally. Finally, the node votes “accept” if £X, > ft, i.e., the node accepts the message 
from the source node if more than [3 columns of M have more than a non-zero entries 
each. 

4.1. Proof Of The 3ROM Algorithm For Link-Fault Model 

The proof of correctness of the OM algorithm, and consequently, the proof of correctness of the 
3ROM algorithm, is centered on the following two properties. 

AP (Agreement Property): If receiver nodes p and q are nonfaulty, then they agree on the 

value ascribed to the transmitter. 

VP (Validity Property): If the transmitter is nonfaulty, then every nonfaulty receiver 

computes the correct value. 

In addition to the network-level matrix M that each node constructs at the end of Round 3, the 
proofs to follow rely on another related matrix M g \ 0 bai that can be constructed at the end of Round 
2. Following Round 2, each node has a vector of received messages that describes the messages 
it received from each node. The Mgiobai, with entries related to the matrix M, i.e., M g iobai(i, j) = 
{ 5 , r, 0}, reflects a global view of the network at the end of Round 2. While it is inaccessible to 
any nodes, it is related to the matrices M built by each node. 

Theorem 1. For a fully connected graph with K > 3F nodes and the link-fault model, i.e., fi < F, 
3ROM(K/3, 2K/3) guarantees agreement at the good nodes. 

Proof. The source node is a good node (all nodes are good in the link-fault model). 

Round 1 - Ns, the source node, broadcasts a Sync message and it is received correctly (valid) by 
at least K-F nodes. Let H be this set of nodes. 

Round 2 - Each node in H broadcasts a Relay message to all other nodes and at least K-F other 
nodes will receive the Relay message correctly. Hence, for each node in H , its 
corresponding column in Mgiobai has at least K-F non-zero entries, i.e., ‘s’ and V. 

Round 3 - Since at the end of Round 2, all good nodes receive messages from the nodes in H, at 
least K-2F = F+l messages, all good nodes participate in this round. Each good node 
broadcasts its vector of received messages to all other nodes and at least K-F nodes will 
receive it correctly. At the end of this round, a node constructs its matrix M, which is 
similar to, but likely different from, M g bbai. Since the rows of M are the messages in 
Round 3, at most F rows can be different from Mgiobai. Thus, the sum of non-zero entries 
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in any column in of M differs from the sum of the same column in M global by at most F 
entries. Therefore, the columns in M corresponding to nodes in H have at least K-F-F 
non-zero entries. Since K > 3F, (equivalently, K/3 > F), there are at least K-F > 2K/3 
columns in M (the nodes in H), with at least K-2F > K/3 non-zero entries. Thus, each 
node votes “accept” for 3ROM(KJ3, 2 K/3 ). □ 

Table 1 is an example of the network-level matrix at the end of Round 3 for F = 2 and K = 7 . 
The grayed cells along the diagonal in this matrix are the messages a node sends to itself, thus, 
cannot be faulty. An “# r” entry indicates that a Sync message was received in Round 2 and was 
replaced by a Relay message from the same node in Round 3. 

Table 1. An example of matrix of received messages at the end of Round 3. 


Ni 

1 

2 

3 

4 

5 

6 

7 

1 

# r 

0 

0 

r 

r 

0 

0 

2 

s r 

r 

0 

0 

r 

0 

0 

3 

s r 

r 

r 

0 

0 

0 

0 

4 

$ r 

0 

r 

r 

0 

0 

0 

5 

s 

r 

r 

r 

r 

0 

0 

6 

& r 

r 

r 

r 

r 

0 

0 

7 

0 

r 

r 

r 

r 

0 

0 

Xi 

1 

1 

1 

1 

1 

0 

0 


Table 2 shows the matrices at Ni and N 2 , as examples, at the end of Round 3. The grayed rows 
indicate the effects of link faults at the two nodes in Round 3. 

Table 2. An example of matrix of received messages at Ni and N 2 at the end of Round 3. 


Ni 

1 

2 

3 

4 

5 

6 

7 


1 

$ r 

0 

0 

r 

r 

0 

0 


2 

$ r 

r 

0 

0 

r 

0 

0 


3 

# r 

r 

r 

0 

0 

0 

0 


4 

$ r 

0 

r 

r 

0 

0 

0 


5 

s 

r 

r 

r 

r 

0 

0 


6 

0 

0 

0 

0 

0 

0 

0 


7 

0 

0 

0 

0 

0 

0 

0 


Xi 

1 

1 

1 

1 

1 

0 

0 

V=1 


n 2 

1 

2 

3 

4 

5 

6 

7 


1 

0 

0 

0 

0 

0 

0 

0 


2 

$ r 

r 

0 

0 

r 

0 

0 


3 

# r 

r 

r 

0 

0 

0 

0 


4 

$ r 

0 

r 

r 

0 

0 

0 


5 

s 

r 

r 

r 

r 

0 

0 


6 

0 r 

r 

r 

r 

r 

0 

0 


7 

0 

0 

0 

0 

0 

0 

0 


Xi 

1 

1 

1 

1 

1 

0 

0 

V=1 


4.2. Proof Of The 3ROM Algorithm For Node-Fault Model 

In the classic node-fault model, a node’s message may be perceived as faulty by many other 
nodes. However, we assume there are up to F simultaneous Byzantine faulty nodes present in 
the network and they behave arbitrarily but are limited to inducing no more than F faults at 
each round, i .e.,fi<F. 


7 






Theorem 2. For a fully connected graph with K>3F+1 nodes and the node-fault model, i.e., 
fi < F, 3ROM(K/3, 2K/3) guarantees agreement at the good nodes. 

Proof. Note that a bounded-Byzantine faulty node can be modeled by a link-fault model, where 
the faulty links are exactly those where the received messages are invalid. Thus, whether the 
source node is good or Byzantine faulty, since the maximum number of bounded-Byzantine 
faulty nodes, F, is less than a third of K and since the link-fault model allows each node up to F 
faults per round, the proof of Theorem 1 applies in this case. □ 

Table 3. An example of matrix of received messages at the good nodes Ni,i= 1 ..5, at the end of 
Round 3. N6 and N 7 are the Byzantine faulty nodes and Nr, is the source node. 


Ni 

1 

2 

3 

4 

5 

6 

7 


1 

r 

r 

r 

0 

0 

# r 

0 


2 

r 

r 

r 

0 

0 

s r 

r 


3 

r 

r 

r 

0 

0 

s 

r 


4 

r 

r 

r 

0 

0 

0 r 

0 


5 

r 

r 

r 

0 

0 

0 

r 


6 

- 

- 

- 

- 

- 

s r 

r 


7 

- 

- 

- 

- 

- 

s r 

r 


Xi 

1 

1 

1 

0 

0 

1 

1 

V=1 


Table 3 is an example of the matrix after Round 3 at Ni. We would like to point out that, unlike 
the Table 2 for the link-fault model, this matrix is the same at all good nodes except for the rows 
and columns corresponding to the faulty nodes; C6, C 7 , and r6, iy, respectively. A entry in the 
matrix means don ’t care. 

Theorem 3 (Agreement). For any F and K, for a fully connected graph with K > 3F+1 nodes 
andfi < F, the 3 ROM algorithm satisfies AP at the good nodes. 

Proof. It follows from Theorems 1 and 2 that, if the assumptions are met, the 3ROM algorithm 
always guarantees agreement at the good nodes. □ 

Theorem 4 (Validity). For any F and K, for a fully connected graph with K > 3F+1 nodes and fi 
< F, the 3ROM algorithm satisfies AP and VP. 

Proof. It follows from Theorems 1 and 2 that, if the assumptions are met, the 3ROM algorithm 
satisfies the agreement and validity properties at the good nodes. □ 

Corollary 5. The number of rounds required by 3 ROM algorithm is independent ofF. 

Proof. It follows from Theorem 3 that the 3ROM algorithm always guarantees agreement at the 
good nodes in three rounds and regardless of a particular value of F. □ 

Theorem 6. For a fully connected graph with K > 3F+1 nodes and fi < F, the node-fault model 
subsumes the link-fault model. 

Proof. Given the assumptions that fi < F, i.e., a node either is faulty and induces up to F faults 
per round or it is good and experiences no more than F faults per round. Given the link-fault 
model, the 3ROM algorithm converts any message from a node Ni to a symmetric message in 
three rounds. With the link- fault model, the nodes are considered to be good even though the 
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faults are manifested on their li nk s. Thus, given up to F faults per node (i.e., the maximum 
number of outgoing faulty links per node), a maximum of KF faults per round are tolerated. 
With the node-fault model, a maximum of F faulty nodes are assumed to be present with up to F 
faults per outgoing links of a faulty node, thus, a total of F 2 faults per round are tolerated. Since 
for F> 0, F 2 < KF, the node-fault model subsumes the link- fault model. □ 

Thus far, we assumed that the Byzantine faulty nodes behave arbitrarily but are limited to 
inducing no more than F faults at any round. We now weaken the assumption of fi< F so that a 
faulty node behaves fully arbitrary in Round 2 and/or Round 3. The 3 ROM algorithm still 
achieves agreement, but, the voting criteria needs to be adjusted to accommodate this weakened 
assumption, i.e., 3ROM(KJ3, K/3+1). One manifestation of a faulty behavior is for the node to 
not broadcast anything during Round 2 and/or Round 3. Note that when a node fails crash-silent, 
fi = K and it can readily be detected from the network-level matrix at the end of Round 3 since its 
corresponding column will have at least K-F zeroes. This diagnosis information can potentially 
be used at the network level. We now show that the 3ROM algorithm still achieves agreement 
when the source is a Byzantine faulty node. 

Theorem 7. For a fully connected graph with K > 3F nodes and the node-fault model, i.e., fi < F, 
when the source node is a Byzantine faulty node, 3ROM(K/3, K/3+1) guarantees agreement at 
the good nodes. 

Proof. The source node is a Byzantine faulty node. 

Round 1 - The source node broadcasts a Sync message to at least K-F nodes where at least K-F 1 - 
F f are good nodes and receive the message correctly (valid). Let FI be this set of good 
nodes. 

Round 2 - Each node in FI broadcasts a Relay message to all other nodes with at least K-F other 
nodes receiving the Relay message correctly. Hence, for each node in H, its 
corresponding column in Mgiobai has at least K-F non-zero entries. 

Round 3 - Since at the end of Round 2, all good nodes receive messages from the nodes in H , at 
least K-2F = F+l messages, all good nodes participate in this round. Each good node 
broadcasts its vector of received messages to all other nodes and at least K-F nodes will 
receive it correctly. At the end of this round, a node constructs its matrix M, which is 
similar to, but likely different from, Mgbbai. The matrix M at a good node Ni is different 
from Mgbbai in at most F rows and F columns corresponding to the Byzantine faulty 
nodes. Also, the matrix M at a good node Ni is identical to the matrix M at other good 
nodes Nj, j = 1..G and jfi, except in the same rows and columns corresponding to the 
Byzantine faulty nodes. Therefore, the columns in M corresponding to nodes in H have 
at least K-F non-zero entries. Furthennore, the column corresponding to the faulty 
source node, has at least K-2F non-zero (the nodes in H) entries in M. Since K > 3F, 
(equivalently, K/3 > F), there are at least K-2F+1 > K/3+1 columns in M (the nodes in H 
plus the faulty source), with at least K-2F > K/3 non-zero entries. Therefore, each node 
votes “accept” for 3ROM(K/3, K/3+1). □ 

Note that when the source is a good node (node-fault model), since the set H consists of at least 
2K/3 good nodes, this weaker assumption holds and Theorems 2 and 7 apply. Also, although 


t Up to F good nodes do not receive the message, 
t Up to F simultaneous faulty nodes. 
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this weaker assumption does not apply to the link-fault model (all nodes are good) and Theorems 

1 still holds, nevertheless, since with this weaker assumption, [i = K/3+1 and K/3+1 < 2 K/3, 
Theorem 7 applies to both models. 

4.3. Message Observation Window, Agreement Within A Time Bound 

Earlier in this paper we stated that to simplify the explanation of the problem and our proposed 
solution, a transmitted message from a single source arrived at the receiving nodes logically at 
the same time. In this section we visit this assumption and justify this rationality. In an 
implementation, as we have explained in Section 3.1, a given message from a single source 
arrives at the receiving nodes within d units of each other. Figure 1 is a depiction of a message 
through three rounds of the 3ROM algorithm. Ns is the source node, Ni and Nj represent the 
nodes that receive the message at the two extremes of the communication latencies, i.e., D and 
D+d = y, respectively, i fj f S. Thus, unless proper measures are taken, consequent relaying of 
messages at subsequent rounds widens message arrivals at the nodes for every round by an 
additional d. In this figure, ‘ j’ indicates broadcasting a message, ‘j’ indicates receiving a 
message, and the labels on these arrows, s, i, and j, correspond to Ns, Ni, and Nj, respectively, as 
the initiators of the messages. 

For the following lemmas, Ns is the source node initiating Round 1, Ni is the node that receives 
the message at the earliest time, i.e., D, and Nj is the node that receives the message at the latest 
time, i.e., D+d = y. 

Lemma 8. Message observation window for Round 2 is [y-2d, y+dj. 

Proof. Round 2 begins by Ni and Nj relaying the messages they received from Ns. For this 
round, the nodes relay the messages as soon as they receive it, i.e., within at most d of each 
other. Thus, at the end of Round 2, the messages arrive at the nodes within 2d of each other. 
Since a node does not physically send a message to itself, but uses its own message, to account 
for the worst case message delivery time, its message is assumed to arrive at itself at y. At the 
end of Round 2, the earliest a message can arrive is at Nj and from the first node that started its 
Round 2, i.e., Ni. As shown on the timeline of activities, this message arrives at the longest delay 
minus the accumulated drift for two rounds, i.e., y-2d. Similarly, the latest a message can arrive 
is at Ni and from the node that started its Round 2 last, i.e., Nj, and at the longest delay plus the 
initial drift between the nodes from the previous round, i.e., y+d. Thus, the window of 
observation for message arrival for Round 2 is [y-2d, y+d]. □ 

Lemma 9. All good nodes participating in Round 2 finish Round 2 and start Round 3 within d of 
each other. 

Proof. From Figure 1, from the start of Round 1 to the end of Round 2, the earliest message 
(EM) and latest message (LM) arrival time at Ns at EMs = y+D-cl and LMs = y+y, respectively. 
Similarly, for Ni, EMi = D+cl+D , LMi = D+d+y, and for Nj, EMj = y+D-d, LMj = y+y. Simple 
algebraic manipulation gives Aem = d and Alm = 0 for any two nodes, i.e., the nodes finish Round 

2 and start Round 3 within d of each other. □ 

Lemma 10. Message obseri’ation window for Round 3 is [y-2d, y+d]. 
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Proof. It follows from Lemma 9 that the nodes start Round 3 within d of each other. In a similar 
argument as Lemma 8, the observation window for message arrival for Round 3 is [}’-2d, y+d\. □ 

It follows from Lemmas 8 through 10 that at the end of each round the nodes are within d of each 
other, thus, justifying rationality of our assumption of logical timing of arrival of messages. 


N s 


N 


▲ 





1 T 

D y 
s i j 





■\— 

D 


V 

y 



d 


time 


Fig. 1. Message observation window = [-2d, d] from y. 

4.4. Complexity Of The 3 ROM Algorithm 

In 3ROM algorithm, since a node does not send a message to itself, the number of transmitted 
messages per node and for each round is (AM). For the worst case analysis, all nodes participate 
in Rounds 2 and 3. Thus, the total number of messages transmitted per round is (AT- 1), K(K- 1 ), 
and K 2 (K-\ ), for Rounds 1, 2 and 3, respectively. Although in Round 3 a node broadcasts a 
vector of messages, for complexity analysis purposes, we count each individual message 
separately. Therefore, the total number of exchanged messages is (iC- 1 1 ) + AT 2 (AT- 1 ) and 

the message complexity for the 3 ROM algorithm is 0(K 3 ). However, if a message is indeed 
physically broadcast to all, e.g., when the communication means is wireless, then the number of 
broadcast messages per node for each round is 1 . Thus, the total number of messages broadcast 
per round is 1, (iC-1), and K(K -\ ), for Rounds 1, 2 and 3, respectively, the total number of 
exchanged messages is l+(Z-l)+/f(if-l), and the message complexity for the 3ROM algorithm is 
0(K 2 ). However, the message complexity for the OM algorithm for the above two scenarios is 
0(K f ) and 0( K 1 " 1 ), respectively. We would like to emphasize that the number of rounds of 
exchanged messages for the 3ROM algorithm is independent of F. 

5. Model Checking 

In this section we present a mechanical verification of the 3ROM algorithm using the model 
checking approach for its ease, feasibility, and quick examination of the problem space, to verify 
correctness of our formal proof of the algorithm. The Symbolic Model Verifier (SMV) [20] was 
used in the modeling of this algorithm. SMV’s language description and modeling capability 
provide relatively easy translation from the pseudo-code. SMV semantics are synchronous 
composition, where all assignments are executed in parallel and synchronously. Thus, a single 
step of the resulting model corresponds to a step in each of the components. 
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A number of cases for each fault model were model checked. In particular, for the node-fault 
model, scenarios with F = 0..3 and K = 4.. 10, respectively, were model checked with the weaker 
assumptions, i.e., Y. c j - F+i and > F+2. Model checking of the link-fault model requires a 
specific number of link faults being considered. Two cases with F = 2, K = 7, and F = 3, K= 10, 
were model checked. Model checking of larger graphs and with more number of node and li nk 
faults can readily be accommodated. The SMV models are listed in Appendices A and B. 

5.1. Model Checked Propositions 

Computational tree logic (CTL), a temporal logic, is used to express properties of a system. In 
CTL formulas are composed of path quantifiers, E and A, and temporal operators, X, F, G, 
and U [21]. A means “All” and has to hold on all paths starting from the current state. F means 
“Finally” and eventually has to hold (somewhere on the subsequent path). In this section the 
claims of agreement at the good nodes and at the end of the third round is examined. The node- 
fault and link-fault models are model checked separately for F = 1,2, and 3, while the same CTL 
proposition is used to verify agreement has been reached at all good nodes for both models. 

For model checking of each scenario, a particular node is instructed to be the source and 
scheduled to initiate broadcast of a Sync message at a particular time. Since the 3 ROM is 
detenninistic, the final vote time, VotingResultTime, is set to the end of the 3 rd round after the 
broadcast of the initial Sync message. Validation of the CTL proposition requires examination of 
an underlying proposition. In particular, the variable VoteTime is used in these properties and is 
defined here. 

VoteTime = (GlobalClock> VotingResultTime) ; 

The GlobcilClock is a measure of elapsed time from the beginning of the operation with respect 
to the real time, i.e., external view. The VoteTime is indicative of the GlobcilClock reaching its 
target value of VotingResultTime and the GlobalAgreement is defined as the conjunction of 
voting results at all good nodes. 

Proposition SystemLiveness : AF (VoteTime) 

This property addresses the liveness property of the system and whether time advances and the 
amount of time elapsed, VoteTime, has advanced beyond the broadcast of the message and the 
three rounds to reach agreement on that message. 

Proposition GlobalAgreement : AF (VoteTime & GlobalAgreement ) 

This proposition encompasses the criteria for the agreement property as well as the claim of 
determinism. The proposition specifies whether or not the system will reach agreement in the 
three rounds after the message was initially broadcast. This property is expected to hold. 

In satisfying the Agreement Property, and its related GlobalAgreement proposition, both node- 
fault and link-fault models had to be included, i.e., faulty transmitters and faulty links were 
model checked. Since this proposition includes the Validity Property (i.e., nonfaulty 
transmitter), a separate proposition was not needed. 
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The model checking results of the bounded model of the algorithm have verified the correctness 
of the algorithm for fully connected networks with K > 3F + 1 nodes, for both node-fault and 
link-fault models, and for the following scenarios; F = 0, 1, 2, 3 simultaneous faults and K= 4, 4, 
7, and 10, respectively. In addition, the results have confirmed the claims of determinism and 
independence of the algorithm from F. 

6. Conclusions 

Distributed systems have become an integral part of safety-critical computing applications, 
necessitating system designs that incorporate complex fault-tolerant resource management 
functions to provide globally coordinated operations with ultra-reliability. As a result, robust 
clock synchronization has become a required fundamental component of fault-tolerant safety- 
critical distributed systems. The main issue in solving the clock synchronization problem for the 
general case is a lack of a symmetric view in the system at the participating good nodes. We first 
enumerated several ways of achieving message symmetry across the system, and then presented 
an alternative, referred to as the 3ROM algorithm, that guarantees agreement in a system in three 
rounds. The 3ROM assumes each node Ni, i = I ..K, either induces up to F faults if it is a 
faulty node, or experiences no more than F faults if it is a good node, and in addition, the 
maximum number of simultaneous faults in the network is limited to F. The algorithm is based 
on the Oral Message algorithm of Lamport et al . , is scalable with respect to the number of nodes 
in the system, and applies equally to the traditional node-fault model as well as the link-fault 
models. The 3ROM is independent of the fault model (node-fault or link-fault model), and is 
independent of the number of faults (in terms of number of required rounds, not the amount of 
messages), and has a message complexity of 0(K 3 ). We also presented a mechanical verification 
of the algorithm for up to three simultaneous Byzantine faults. The model-checking effort was 
focused on verifying the correctness of a bounded model of the algorithm as well as confirming 
claims of determinism. The underlying topology in this paper was a fully connected graph. We 
leave the generalization of our solution to other topologies, including an arbitrary graph that 
meets the minimum requirements of number of nodes and connectivity, to future works. 
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Appendix A 


- File Name: 3ROM_NodeFault_K10.smv 

This file is for a system of K good nodes, where F > 1 . 
Note that the graph is fully connected. 


-- Environment : SMV 

-- Organization: NASA Langley Research Center 

-- Project: Self-Stablization 

-- Authors: Malekpour, Mahyar 

NASA Langley Research Center 
Hampton, VA 23681 -21 99 
-- Creation Date: 9/2/2014 


-- This SMV description is property of the National Aeronautics and Space 
-- Administration. Unauthorized use or duplication of this VHDL description is 
- strictly prohibited. Authorized users are subject to the following 
-- restrictions: 

-- . Neither the author, their corporation, nor NASA is responsible for any 

consequence of the use of this SMV description. 

-- . The origin of this SMV description must not be misrepresented either 
by explicit claim or by omission. 

-- . Altered versions of this SMV description must be plainly marked as 

such. 


- . This notice may not be removed or altered. 


- Modified on: 6/3/2014 

- by: Mahyar Malekpour 


-- Global Constants: 


#define 

#define 

#define 

#define 

#define 

#define 

#define 

#define 

#define 

#define 


Nodejdl 
Node_ld2 
Node_ld3 
Node_ld4 
Node_ld5 
Node_ld6 
Node_ld7 
Node_ld8 
Node_ld9 
Node IdIO 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 


#define SourceNodeld 


(Nodejdl) 


- Drift in units of Gamma 
#define DriftP 


( 5 ) 


- Network Size 
#define K 
#define G 
#define F 
#define FPIusOne 
#define FPIusTwo 
#define TwoFPIusOne 


10 

7 

3 


(F+1) 

(F + 2) 

(2 * F + 1 ) 
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-- Topology = fully connected graph 

#define P 

(20 + Drift 

#define GlobalClockMax 

(P- 

-- P of SourceNodeld + 3 rounds 

#define TimeToVote 

(P + 

-- Drift at the local level 

#define P_N1 

(P-0) 

#define P_N2 

(P-1) 

#define P_N3 

(P-2) 

#define P_N4 

(P-3) 

#define P_N5 

(P-4) 

#define P_N6 

(P + 1) 

#define P_N7 

(P + 2) 

#define P_N8 

(P + 3) 

#define P_N9 

(P + 4) 

#define P_N10 

(P + 5) 


MODULE main 
VAR 

Global_Clock : 0 .. GlobalClockMax ; 

FaultyNode_1 : FaultyNode (Node_ld8, P_N8, Global_Clock, SourceNodeld, 
Node_1 .MessageOut, 

Node_2.MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_1 0. MessageOut) ; 

FaultyNode_1 .MessageOut_8, 

FaultyNode_2.MessageOut_9, 

FaultyNode_3.MessageOut_10) ; 

FaultyNode_2 : FaultyNode (Node_ld9, P_N9, Global_Clock, SourceNodeld, 
Node_1 .MessageOut, 

Node_2. MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_1 0. MessageOut) ; 

FaultyNode_1 .MessageOut_8, 

FaultyNode_2.MessageOut_9, 

FaultyNode_3.MessageOut_10) ; 

FaultyNode_3 : FaultyNode (NodeJdlO, P_N10, Global_Clock, SourceNodeld, 
Node_1 .MessageOut, 

Node_2. MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_1 0. MessageOut) ; 

FaultyNode_1 .MessageOut_8, 
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FaultyNode_2.MessageOut_9, 
FaultyNode_3.MessageOut_10) ; 


Node_1 : Node (Nodejdl, P_N1, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2. MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_10. MessageOut) ; 

FaultyNode_1 .MessageOut_1 , 

FaultyNode_2.MessageOut_1 , 

FaultyNode_3.MessageOut_1 , 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 
FaultyNode_1.MsgVector, FaultyNode_2.MsgVector, FaultyNode_3.MsgVector) ; 

Node_2 : Node (Node_ld2, P_N2, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2. MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_10. MessageOut) ; 

FaultyNode_1 .MessageOut_2, 

FaultyNode_2.MessageOut_2, 

FaultyNode_3.MessageOut_2, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 
FaultyNode_1.MsgVector, FaultyNode_2.MsgVector, FaultyNode_3.MsgVector) ; 

Node_3 : Node (Node_ld3, P_N3, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2. MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_10. MessageOut) ; 

FaultyNode_1 .MessageOut_3, 

FaultyNode_2.MessageOut_3, 

FaultyNode_3.MessageOut_3, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 
FaultyNode_1.MsgVector, FaultyNode_2.MsgVector, FaultyNode_3.MsgVector) ; 

Node_4 : Node (Node_ld4, P_N4, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2. MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

Node_8. MessageOut, 
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Node_9.MessageOut, 

Node_10.MessageOut) ; 

FaultyNode_1 .MessageOut_4, 

FaultyNode_2.MessageOut_4, 

FaultyNode_3.MessageOut_4, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.IVlsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 
FaultyNode_1.MsgVector, FaultyNode_2.MsgVector, FaultyNode_3.MsgVector) ; 

Node_5 : Node (Node_ld5, P_N5, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2. MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_10. MessageOut) ; 

FaultyNode_1 .MessageOut_5, 

FaultyNode_2.MessageOut_5, 

FaultyNode_3.MessageOut_5, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 
FaultyNode_1.MsgVector, FaultyNode_2.MsgVector, FaultyNode_3.MsgVector) ; 

Node_6 : Node (Node_ld6, P_N6, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2. MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_10. MessageOut) ; 

FaultyNode_1 .MessageOut_6, 

FaultyNode_2.MessageOut_6, 

FaultyNode_3.MessageOut_6, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 
FaultyNode_1.MsgVector, FaultyNode_2.MsgVector, FaultyNode_3.MsgVector) ; 

Node_7 : Node (Node_ld7, P_N7, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2. MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_10. MessageOut) ; 

Fau!tyNode_1 .MessageOut_7, 

FaultyNode_2.MessageOut_7, 

FaultyNode_3.MessageOut_7, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 
FaultyNode_1.MsgVector, FaultyNode_2.MsgVector, FaultyNode_3.MsgVector) ; 

Node_8 : Node (Node_ld8, P_N8, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2. MessageOut, 
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Node_3.MessageOut, 

Node_4.MessageOut, 

Node_5.MessageOut, 

Node_6. MessageOut, 

Node_7.MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_10.MessageOut, 

FaultyNode_1 .MessageOut_7, 

FaultyNode_2.MessageOut_7, 

FaultyNode_3.MessageOut_7, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6. Msg Vector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 

Node_9 : Node (Node_ld9, P_N9, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2.MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_10. MessageOut, 

FaultyNode_1 .MessageOut_7, 

FaultyNode_2.MessageOut_7, 

FaultyNode_3.MessageOut_7, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4. Msg Vector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 

Node_10 : Node (NodeJdIO, P_N10, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2. MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_10. MessageOut, 

FaultyNode_1 .MessageOut_7, 

FaultyNode_2.MessageOut_7, 

FaultyNode_3.MessageOut_7, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 


Agreement : boolean ; 


DEFINE 


VoteTime := (Global_Clock >= TimeToVote) ; 

-- For all good nodes and good links. 

GlobalAgreement := (Global_Clock = TimeToVote) & 

(Node_1 .VoteResult & Node_2.VoteFtesult & 

Node_3.VoteResult & Node_4. VoteResult & 

Node_5. VoteResult & Node_6. VoteResult & 

Node_7. VoteResult & Node_8. VoteResult & 

Node_9. VoteResult & Node_10. VoteResult) ; 

- For only the good nodes with the node-fault model, i.e, faulty nodes but no faulty links. 
GlobalAgreementNodeFault := 
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(Global_Clock = TimeToVote) & 

(Node_1 .VoteResult & Node_2.VoteResult & 
Node_3.VoteResult & Node_4. VoteResult & 
Node_5. VoteResult & Node_6. VoteResult & 
Node_7. VoteResult) ; 


ASSIGN 


init (Global_Clock) := 0 ; 
next (Global_Clock) := 
case 

(Global_Clock < GlobalClockMax) 
1 : Global_Clock ; 
esac ; 


Global Clock + 1 


SPEC 


-- Proposition #1 : 
- AF (VoteTime) 


-- Proposition #2: 

AF (VoteTime & GlobalAgreementNodeFault) -- true 


-- end of main 


-- Faulty node. 


MODULE FaultyNode (Nodejd, MyP, Global_Clock, SourceNode, 

N1_Msg, N2_Msg, N3_Msg, N4_Msg, N5_Msg, N6_Msg, N7_Msg, N8_Msg, N9_Msg, N10_Msg) 


VAR 

MessageOut_1 : {NONE, Sync, Relay} ; 
MessageOut_2 : {NONE, Sync, Relay} ; 
MessageOut_3 : {NONE, Sync, Relay} ; 
MessageOut_4 : {NONE, Sync, Relay} ; 
MessageOut_5 : {NONE, Sync, Relay} ; 
MessageOut_6 : {NONE, Sync, Relay} ; 
MessageOut_7 : {NONE, Sync, Relay} ; 
MessageOut_8 : {NONE, Sync, Relay} ; 
MessageOut_9 : {NONE, Sync, Relay} ; 
MessageOut_10 : {NONE, Sync, Relay} ; 

MsgVector : array 1..10 of {0, 1} ; 
VoteResult : boolean ; 

DEFINE 


Count_Sync := 

((N1_Msg = Sync) + 


20 



(N2_Msg = Sync) + 
(N3_Msg = Sync) + 
(N4_Msg = Sync) + 
(N5_Msg = Sync) + 
(N6_Msg = Sync) + 
(N7_Msg = Sync) + 
(N8_Msg = Sync) + 
(N9_Msg = Sync) + 
(N10_Msg = Sync)) ; 


ASSIGN 


init (MessageOut_1) := NONE ; 
init (MessageOut_2) := NONE ; 
init (MessageOutJ3) := NONE ; 
init (MessageOut_4) := NONE ; 
init (MessageOutJj) := NONE ; 
init (MessageOutJS) := NONE ; 
init (MessageOut_7) := NONE ; 
init (MessageOutJS) := NONE ; 
init (MessageOutJS) := NONE ; 
init (MessageOut J 0) := NONE ; 


next (MessageOut_1 ) := 
case 

(Global_Clock = MyP) & (SourceNode = Nodejd) : Sync ; 

!(SourceNode = Node_ld) & (Count_Sync > 0) : {NONE, Relay} 

1 : NONE ; 
esac ; 

next (MessageOut_2) := 
case 

(Global_Clock = MyP) & (SourceNode = Nodejd) : Sync ; 

!(SourceNode = Nodejd) & (Count_Sync > 0) : {NONE, Relay} 

1 : NONE ; 
esac ; 

next (MessageOut_3) := 
case 

(Global_Clock = MyP) & (SourceNode = Nodejd) : Sync ; 

!(SourceNode = Nodejd) & (Count_Sync > 0) : {NONE, Relay} 

1 : NONE ; 
esac ; 

next (MessageOut_4) := 
case 

(Global_Clock = MyP) & (SourceNode = Nodejd) : Sync ; 

!(SourceNode = Nodejd) & (Count_Sync > 0) : {NONE, Relay} 

1 : NONE ; 
esac ; 

next (MessageOutJS) := 
case 

(Global_Clock = MyP) & (SourceNode = Nodejd) : NONE ; 

!(SourceNode = Nodejd) & (Count_Sync > 0) : {NONE, Relay} 

1 : NONE ; 
esac ; 

next (MessageOutJS) := 
case 

(Global_Clock = MyP) & (SourceNode = Nodejd) : NONE ; 

!(SourceNode = Nodejd) & (Count_Sync > 0) : {NONE, Relay} 

1 : NONE ; 
esac ; 

next (MessageOut_7) := 
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case 

(Global_Clock = MyP) & (SourceNode = Node_ld) : NONE ; 

!(SourceNode = Node_ld) & (Count_Sync > 0) : {NONE, Relay} ; 

1 : NONE ; 
esac ; 

next (MessageOut_8) := 
case 

(Global_Clock = MyP) & (SourceNode = Node_ld) : NONE ; 

!(SourceNode = Node_ld) & (Count_Sync > 0) : {NONE, Relay} ; 

1 : NONE ; 
esac ; 

next (MessageOut_9) := 
case 

(Global_Clock = MyP) & (SourceNode = Nodejd) : NONE ; 

!(SourceNode = Nodejd) & (Count_Sync > 0) : {NONE, Relay} ; 

1 : NONE ; 
esac ; 

next (MessageOut J 0) := 
case 

(Global_Clock = MyP) & (SourceNode = Nodejd) : NONE ; 

!(SourceNode = Nodejd) & (Count_Sync > 0) : {NONE, Relay} ; 

1 : NONE ; 
esac ; 


-- end of FaultyNode 


-- Good node, for node-fault model. 


MODULE Node (Nodejd, MyP, Global_Clock, SourceNode, 

N1_Msg, N2_Msg, N3_Msg, N4_Msg, N5_Msg, N6_Msg, N7_Msg, N8_Msg, N9_Msg, N10_Msg, 
N1_MsgVector, N2_MsgVector, N3_Msg Vector, N4_Msg Vector, N5_MsgVector, 

N6_MsgVector, N7_MsgVector, N8_Msg Vector, N9_Msg Vector, N10_Msg Vector) 


VAR 

RoundNum : 0 .. 3 ; 

MessageOut : {NONE, Sync, Relay} ; 

- Messages recieved (Rx) from the nodes are saved in this vector and will be 

- broadcast/used during round 3. 

MsgVector : array 1 .. 1 0 of {0, 1 } ; 

VoteResult : boolean ; 

DEFINE 


Count_Sync := 

((N1_Msg = Sync) + 
(N2_Msg = Sync) + 
(N3_Msg = Sync) + 
(N4_Msg = Sync) + 
(N5_Msg = Sync) + 
(N6_Msg = Sync) + 
(N7_Msg = Sync) + 
(N8_Msg = Sync) + 
(N9_Msg = Sync) + 
(NIOJVIsg = Sync)) ; 

Count_Relay := 

((N1_Msg = Relay) + 
(N2_Msg = Relay) + 
(N3_Msg = Relay) + 
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(N4_Msg = Relay) + 
(N5_Msg = Relay) + 
(N6_Msg = Relay) + 
(N7_Msg = Relay) + 
(N8_Msg = Relay) + 
(N9_Msg = Relay) + 
(N10_Msg = Relay)) ; 

Count_MyVector := 
(MsgVector [1] + 
MsgVector [2] + 
MsgVector [3] + 
MsgVector [4] + 
MsgVector [5] + 
MsgVector [6] + 
MsgVector [7] + 
MsgVector [8] + 
MsgVector [9] + 
MsgVector [10]) ; 

Count_MatrixColumn_1 := 
(N1_MsgVector [1] + 
N2_MsgVector [1 ] + 
N3_MsgVector [1 ] + 
N4_MsgVector [1 ] + 
N5_MsgVector [1] + 
N6_MsgVector [1] + 
N7_MsgVector [1] + 
N8_MsgVector [1 ] + 
N9_MsgVector [1 ] + 
N10_MsgVector [1]) ; 

Count_MatrixColumn_2 := 
(N1_Msg Vector [2] + 
N2_MsgVector [2] + 
N3_MsgVector [2] + 
N4_MsgVector [2] + 
N5_MsgVector [2] + 
N6_MsgVector [2] + 
N7_MsgVector [2] + 
N8_MsgVector [2] + 
N9_MsgVector [2] + 
N10_Msg Vector [2]) ; 

Count_MatrixColumn_3 := 
(N1_Msg Vector [3] + 
N2_MsgVector [3] + 
N3_MsgVector [3] + 
N4_MsgVector [3] + 
N5_MsgVector [3] + 
N6_MsgVector [3] + 
N7_MsgVector [3] + 
N8_MsgVector [3] + 
N9_MsgVector [3] + 
N10_Msg Vector [3]) ; 

Count_MatrixColumn_4 := 
(N1_Msg Vector [4] + 
N2_MsgVector [4] + 
N3_MsgVector [4] + 
N4_MsgVector [4] + 
N5_MsgVector [4] + 
N6_MsgVector [4] + 
N7_MsgVector [4] + 
N8_MsgVector [4] + 
N9_MsgVector [4] + 
N10_MsgVector [4]) ; 

Count_MatrixColumn_5 := 
(N1_Msg Vector [5] + 
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N2_MsgVector [5] + 
N3_MsgVector [5] + 
N4_MsgVector [5] + 
N5_MsgVector [5] + 
N6_MsgVector [5] + 
N7_MsgVector [5] + 
N8_MsgVector [5] + 
N9_MsgVector [5] + 
N10_Msg Vector [5]) ; 

Count_MatrixColumn_6 := 
(N1_Msg Vector [6] + 
N2_MsgVector [6] + 
N3_MsgVector [6] + 
N4_MsgVector [6] + 
N5_MsgVector [6] + 
N6_MsgVector [6] + 
N7_MsgVector [6] + 
N8_MsgVector [6] + 
N9_MsgVector [6] + 
N10_Msg Vector [6]) ; 

Count_MatrixColumn_7 := 
(N1_Msg Vector [7] + 
N2_MsgVector [7] + 
N3_MsgVector [7] + 
N4_MsgVector [7] + 
N5_MsgVector [7] + 
N6_MsgVector [7] + 
N7_MsgVector [7] + 
N8_MsgVector [7] + 
N9_MsgVector [7] + 
N10_Msg Vector [7]) ; 

Count_MatrixColumn_8 := 
(N1_Msg Vector [8] + 
N2_MsgVector [8] + 
N3_MsgVector [8] + 
N4_MsgVector [8] + 
N5_MsgVector [8] + 
N6_MsgVector [8] + 
N7_MsgVector [8] + 
N8_IVIsgVector [8] + 
N9_MsgVector [8] + 
N10_Msg Vector [8]) ; 

Count_MatrixColumn_9 := 
(N1_Msg Vector [9] + 
N2_MsgVector [9] + 
N3_MsgVector [9] + 
N4_MsgVector [9] + 
N5_MsgVector [9] + 
N6_MsgVector [9] + 
N7_MsgVector [9] + 
N8_MsgVector [9] + 
N9_MsgVector [9] + 
N10_Msg Vector [9]) ; 

Count_MatrixColumn_10 : 
(N1_Msg Vector [10] + 
N2_MsgVector [1 0] + 
N3_MsgVector [1 0] + 
N4_MsgVector [1 0] + 
N5_MsgVector [1 0] + 
N6_MsgVector [1 0] + 
N7_MsgVector [1 0] + 
N8_MsgVector [1 0] + 
N9_MsgVector [1 0] + 
N10_MsgVector [10]) ; 



VoteResult := 

(RoundNum = 3) & 

(((Count_MatrixColumn_1 >= FPIusOne) + 

(Count_MatrixColumn_2 >= FPIusOne) + 

(Count_MatrixColumn_3 >= FPIusOne) + 

(Count_MatrixColumn_4 >= FPIusOne) + 

(Count_MatrixColumn_5 >= FPIusOne) + 

(Count_MatrixColumn_6 >= FPIusOne) + 

(Count_MatrixColumn_7 >= FPIusOne) + 

(Count_MatrixColumn_8 >= FPIusOne) + 

(Count_MatrixColumn_9 >= FPIusOne) + 

(Count_MatrixColumn_1 0 >= FPIusOne)) >= TwoFPIusOne) ; -- For faulty nodes participating in rounds 2 and 3. 
(CountJVIatrixColumnJO >= FPIusOne)) >= FPIusTwo) ; -- For faulty nodes going silent in rounds 2 and 3. 


ASSIGN 


init (RoundNum) := 0 ; 

- init (MessageOut) := {NONE, Sync, Relay} ; 
init (MessageOut) := NONE ; 

init (MsgVector [1]) := 0 ; 
init (MsgVector [2]) := 0 ; 
init (MsgVector [3]) := 0 ; 
init (MsgVector [4]) := 0 ; 
init (MsgVector [5]) := 0 ; 
init (MsgVector [6]) := 0 ; 
init (MsgVector [7]) := 0 ; 
init (MsgVector [8]) := 0 ; 
init (MsgVector [9]) := 0 ; 
init (MsgVector [10]) := 0 ; 


next (RoundNum) := 
case 

(RoundNum = 3) : 0 ; 

(Global_Clock = MyP) & (SourceNode = Nodejd) : 1 ; 

(Count_Sync > 0) : 2 ; 

- If at least F Relays and one Sync, from the source of course, then Round = 3. 
((Count_Relay + MsgVector [SourceNode]) >= FPIusOne) : 3 ; 

- Or, alternatively, the following will do. 

(Count_Relay + Count_MyVector >= FPIusOne) : 3 ; 

1 : RoundNum ; 
esac ; 


next (MessageOut) := 
case 

(Global_Clock = MyP) & (SourceNode = Nodejd) : Sync ; 
(Count_Sync > 0) : Relay ; 

1 : NONE ; 
esac ; 


next (MsgVector [1]) := 
case 

(N1_Msg = Sync) I (N1 JVIsg = Relay) : 1 ; 

1 : MsgVector [1] ; 
esac ; 

next (MsgVector [2]) := 
case 

(N2_Msg = Sync) | (N2_Msg = Relay) : 1 ; 

1 : MsgVector [2] ; 
esac ; 

next (MsgVector [3]) := 
case 
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(N3_Msg = Sync) | (N3_Msg = Relay) : 1 ; 

1 : MsgVector [3] ; 
esac ; 

next (MsgVector [4]) := 
case 

(N4_Msg = Sync) | (N4_Msg = Relay) : 1 ; 

1 : MsgVector [4] ; 
esac ; 

next (MsgVector [5]) := 
case 

(N5_Msg = Sync) | (N5_Msg = Relay) : 1 ; 

1 : MsgVector [5] ; 
esac ; 

next (MsgVector [6]) := 
case 

(N6_Msg = Sync) | (N6_Msg = Relay) : 1 ; 

1 : MsgVector [6] ; 
esac ; 

next (MsgVector [7]) := 
case 

(N7_Msg = Sync) | (N7_Msg = Relay) : 1 : 

1 : MsgVector [7] ; 
esac ; 

next (MsgVector [8]) := 
case 

(N8_Msg = Sync) | (N8_Msg = Relay) : 1 ; 

1 : MsgVector [8] ; 
esac ; 

next (MsgVector [9]) := 
case 

(N9_Msg = Sync) | (N9_Msg = Relay) : 1 ; 

1 : MsgVector [9] ; 
esac ; 

next (MsgVector [10]) := 
case 

(N10_Msg = Sync) | (N10_Msg = Relay) : 1 ; 

1 : MsgVector [1 0] ; 
esac ; 


-- end of Node 
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Appendix B 


- File Name: 3ROM_LinkFault_k10.smv 

This file is for a system of K good nodes, where F > 1 . 
Note that the graph is fully connected. 


-- Environment : SMV 

-- Organization: NASA Langley Research Center 

-- Project: Self-Stablization 

-- Authors: Malekpour, Mahyar 

NASA Langley Research Center 
Hampton, VA 23681 -21 99 
-- Creation Date: 9/4/2014 


-- This SMV description is property of the National Aeronautics and Space 
-- Administration. Unauthorized use or duplication of this VHDL description is 
- strictly prohibited. Authorized users are subject to the following 
-- restrictions: 

-- . Neither the author, their corporation, nor NASA is responsible for any 

consequence of the use of this SMV description. 

-- . The origin of this SMV description must not be misrepresented either 
by explicit claim or by omission. 

-- . Altered versions of this SMV description must be plainly marked as 

such. 


- . This notice may not be removed or altered. 


- Modified on: 9/4/2014 

- by: Mahyar Malekpour 


-- Global Constants: 


#define 

#define 

#define 

#define 

#define 

#define 

#define 

#define 

#define 

#define 


Nodejdl 
Node_ld2 
Node_ld3 
Node_ld4 
Node_ld5 
Node_ld6 
Node_ld7 
Node_ld8 
Node_ld9 
Node IdIO 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 


#define SourceNodeld 


(Nodejdl) 


- Drift in units of Gamma 
#define DriftP 


( 5 ) 


- Network Size 
#define K 10 

#define G 7 

#define F 3 

#define FPIusOne (F + 1) 

#define FPIusTwo (F + 2) 

#define TwoFPIusOne (2 * F - 
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-- Topology = fully connected graph 
#define P (20 + DriftP) 

#define GlobalClockMax (P + 5) 

-- P of SourceNodeld + 3 rounds 
#define TimeToVote (P + 3) 

-- Drift at the local level 
#define P_N1 
#define P_N2 
#define P_N3 
#define P_N4 
#define P_N5 
#define P_N6 
#define P_N7 
#define P_N8 
#define P_N9 
#define P_N10 


MODULE main 
VAR 

Global_Clock : 0 .. GlobalClockMax ; 


Node_1 : Node (Nodejdl, P_N1, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2.MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_1 0. MessageOut, 

NONE, 

NONE, 

NONE, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 

Node_2 : Node (Node_ld2, P_N2, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2. MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

NONE, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_1 0. MessageOut, 

NONE, 

NONE, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 

Node_3 : Node (Node_ld3, P_N3, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2. MessageOut, 


(P-0) 

(P-1) 

(P-2) 

(P-3) 

(P-4) 

(P + 1) 
(P + 2) 
(P + 3) 
(P + 4) 
(P + 5) 
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Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5.MessageOut, 

Node_6.MessageOut, 

NONE, 

NONE, 

Node_7. MessageOut, 

Node_8.MessageOut, 

NONE, 

Node_9. MessageOut, 

Node_10.MessageOut, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 

Node_4 : Node (Node_ld4, P_N4, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2. MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

NONE, 

Node_6. MessageOut, 

Node_7. MessageOut, 

NONE, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_1 0. MessageOut, 

NONE, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 

Node_5 : Node (Node_ld5, P_N5, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2. MessageOut, 

Node_3. MessageOut, 

NONE, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

NONE, 

Node_7. MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

NONE, 

Node_10. MessageOut, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 

Node_6 : Node (Node_ld6, P_N6, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2. MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

NONE, 

NONE, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

Node_8. MessageOut, 

NONE, 

Node_9. MessageOut, 

Node_10. MessageOut, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 

Node_7 : Node (Node_ld7, P_N7, Global_Clock, SourceNodeld, 
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Node_1 .MessageOut, 

Node_2.MessageOut, 

NONE, 

Node_3. MessageOut, 

Node_4. MessageOut, 

NONE, 

Node_5. MessageOut, 

Node_6. MessageOut, 

NONE, 

Node_7. MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_10. MessageOut, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 

Node_8 : Node (Node_ld8, P_N8, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2. MessageOut, 

NONE, 

NONE, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

NONE, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_10. MessageOut, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 

Node_9 : Node (Node_ld9, P_N9, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

NONE, 

Node_2. MessageOut, 

Node_3. MessageOut, 

Node_4. MessageOut, 

Node_5. MessageOut, 

NONE, 

NONE, 

Node_6. MessageOut, 

Node_7. MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_10. MessageOut, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 

Node_10 : Node (Node_ld10, P_N10, Global_Clock, SourceNodeld, 

Node_1 .MessageOut, 

Node_2. MessageOut, 

Node_3. MessageOut, 

NONE, 

NONE, 

NONE, 

Node_4. MessageOut, 

Node_5. MessageOut, 

Node_6. MessageOut, 

Node_7. MessageOut, 

Node_8. MessageOut, 

Node_9. MessageOut, 

Node_10. MessageOut, 

Node_1.MsgVector, Node_2.MsgVector, Node_3.MsgVector, Node_4.MsgVector, 
Node_5.MsgVector, Node_6.MsgVector, Node_7.MsgVector, 

Node_8.MsgVector, Node_9.MsgVector, Node_10.MsgVector) ; 


30 



DEFINE 


VoteTime := (Global_Clock >= TimeToVote) ; 

GlobalAgreement := (Global_Clock = TimeToVote) & 

(Node_1 .VoteResult & Node_2.VoteResult & 
Node_3.VoteResult & Node_4. VoteResult & 
Node_5. VoteResult & Node_6. VoteResult & 
Node_7. VoteResult & Node_8. VoteResult & 
Node_9. VoteResult & Node_10. VoteResult) ; 


ASSIGN 


init (Global_Clock) := 0 ; 
next (Global_Clock) := 
case 

(Global_Clock < GlobalClockMax) 
1 : Global Clock ; 


Global Clock + 1 


SPEC 


-- Proposition #1 : 
-- AF (VoteTime) 


-- Proposition #2: 

AF (VoteTime & GlobalAgreement) - true 


-- end of main 


-- Good node, for link-fault model. 


MODULE Node (Nodejd, MyP, Global_Clock, SourceNode, 

N1_Msg, N2_Msg, N3_Msg, N4_Msg, N5_Msg, N6_Msg, N7_Msg, N8_Msg, N9_Msg, N10_Msg, 
N1_MsgVector, N2_MsgVector, N3_MsgVector, N4_MsgVector, N5_Msg Vector, 

N6_MsgVector, N7_MsgVector, N8_Msg Vector, N9_Msg Vector, N10_Msg Vector) 


VAR 

RoundNum : 0 .. 3 ; 

MessageOut : {NONE, Sync, Relay} ; 

- Messages recieved (Rx) from the nodes are saved in this vector and will be 

- broadcast/used during round 3. 

MsgVector : array 1 ..10 of {0, 1} ; 

VoteResult : boolean ; 

DEFINE 


Count_Sync := 

((N1_Msg = Sync) + 


31 



(N2_Msg = Sync) + 
(N3_Msg = Sync) + 
(N4_Msg = Sync) + 
(N5_Msg = Sync) + 
(N6_Msg = Sync) + 
(N7_Msg = Sync) + 
(N8_Msg = Sync) + 
(N9_Msg = Sync) + 
(N10_Msg = Sync)) ; 

Count_Relay := 

((N1_Msg = Relay) + 
(N2_Msg = Relay) + 
(N3_Msg = Relay) + 
(N4_Msg = Relay) + 
(N5_Msg = Relay) + 
(N6_Msg = Relay) + 
(N7_Msg = Relay) + 
(N8_Msg = Relay) + 
(N9_Msg = Relay) + 
(N10_Msg = Relay)) ; 

Count_MyVector := 
(MsgVector [1] + 
MsgVector [2] + 
MsgVector [3] + 
MsgVector [4] + 
MsgVector [5] + 
MsgVector [6] + 
MsgVector [7] + 
MsgVector [8] + 
MsgVector [9] + 
MsgVector [10]) ; 

Count_MatrixColumn_1 := 
(N1_MsgVector [1] + 
N2_MsgVector [1 ] + 
N3_MsgVector [1 ] + 
N4_MsgVector [1 ] + 
N5_MsgVector [1 ] + 
N6_MsgVector [1] + 
N7_MsgVector [1] + 
N8_MsgVector [1] + 
N9_MsgVector [1 ] + 
N10_MsgVector [1]) - F ; 

Count_MatrixColumn_2 := 
(N1_Msg Vector [2] + 
N2_MsgVector [2] + 
N3_MsgVector [2] + 
N4_MsgVector [2] + 
N5_MsgVector [2] + 
N6_MsgVector [2] + 
N7_MsgVector [2] + 
N8_MsgVector [2] + 
N9_MsgVector [2] + 
N10_Msg Vector [2]) - F ; 

Count_MatrixColumn_3 := 
(N1_Msg Vector [3] + 
N2_MsgVector [3] + 
N3_MsgVector [3] + 
N4_MsgVector [3] + 
N5_MsgVector [3] + 
N6_MsgVector [3] + 
N7_MsgVector [3] + 
N8_MsgVector [3] + 
N9_MsgVector [3] + 
N10_Msg Vector [3]) - F ; 
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Count_MatrixColumn_4 := 
(N1_Msg Vector [4] + 
N2_MsgVector [4] + 
N3_MsgVector [4] + 
N4_MsgVector [4] + 
N5_MsgVector [4] + 
N6_MsgVector [4] + 
N7_MsgVector [4] + 
N8_MsgVector [4] + 
N9_MsgVector [4] + 
N10_Msg Vector [4]) - F ; 

Count_MatrixColumn_5 := 
(N1_Msg Vector [5] + 
N2_MsgVector [5] + 
N3_MsgVector [5] + 
N4_MsgVector [5] + 
N5_MsgVector [5] + 
N6_MsgVector [5] + 
N7_MsgVector [5] + 
N8_MsgVector [5] + 
N9_MsgVector [5] + 
N10_Msg Vector [5]) - F ; 

Count_MatrixColumn_6 := 
(N1_Msg Vector [6] + 
N2_MsgVector [6] + 
N3_MsgVector [6] + 
N4_MsgVector [6] + 
N5_MsgVector [6] + 
N6_MsgVector [6] + 
N7_MsgVector [6] + 
N8_MsgVector [6] + 
N9_MsgVector [6] + 
N10_Msg Vector [6]) - F ; 

Count_MatrixColumn_7 := 
(N1_Msg Vector [7] + 
N2_MsgVector [7] + 
N3_MsgVector [7] + 
N4_MsgVector [7] + 
N5_MsgVector [7] + 
N6_MsgVector [7] + 
N7_MsgVector [7] + 
N8_MsgVector [7] + 
N9_MsgVector [7] + 
N10_Msg Vector [7]) - F ; 

Count_MatrixColumn_8 := 
(N1_Msg Vector [8] + 
N2_MsgVector [8] + 
N3_MsgVector [8] + 
N4_MsgVector [8] + 
N5_MsgVector [8] + 
N6_IVlsgVector [8] + 
N7_MsgVector [8] + 
N8_MsgVector [8] + 
N9_MsgVector [8] + 
N10_Msg Vector [8]) - F ; 


Count_MatrixColumn_ 

9 

(N1_Msg Vector 

[9] 

+ 

N2_MsgVector 

[9] 

+ 

N3_MsgVector 

[9] 

+ 

N4_MsgVector 

[9] 

+ 

N5_MsgVector 

[9] 

+ 

N6_MsgVector 

[9] 

+ 

N7_MsgVector 

[9] 

+ 

N8_MsgVector 

[9] 

+ 

N9_MsgVector 

[9] 

+ 
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N10_Msg Vector [9]) - F ; 

Count_MatrixColumn_10 := 

(N1_MsgVector [10] + 

N2JVIsgVector [1 0] + 

N3JMsgVector [1 0] + 

N4JVIsgVector [1 0] + 

N5_MsgVector [1 0] + 

N6JMsgVector [1 0] + 

N7_MsgVector [1 0] + 

N8JVIsgVector [1 0] + 

N9JMsgVector [1 0] + 

N10_MsgVector [10]) - F ; 

VoteResult := 

(RoundNum = 3) & 

(((CountJVIatrixColumnJ >= FPIusOne) + 
(Count_MatrixColumn_2 >= FPIusOne) + 
(Count_MatrixColumn_3 >= FPIusOne) + 
(Count_MatrixColumn_4 >= FPIusOne) + 
(Count_MatrixColumn_5 >= FPIusOne) + 
(Count_MatrixColumn_6 >= FPIusOne) + 
(Count_MatrixColumn_7 >= FPIusOne) + 
(Count_MatrixColumn_8 >= FPIusOne) + 
(Count_MatrixColumn_9 >= FPIusOne) + 
(Count_MatrixColumn_10 >= FPIusOne)) >= TwoFPIusOne) ; 
(CountJVIatrixColumnJ 0 >= FPIusOne)) >= FPIusTwo) ; 


ASSIGN 


init (RoundNum) := 0 ; 

- init (MessageOut) := (NONE, Sync, Relay} ; 
init (MessageOut) := NONE ; 

init (MsgVector [1]) := 0 ; 
init (MsgVector [2]) := 0 ; 
init (MsgVector [3]) := 0 ; 
init (MsgVector [4]) := 0 ; 
init (MsgVector [5]) := 0 ; 
init (MsgVector [6]) := 0 ; 
init (MsgVector [7]) := 0 ; 
init (MsgVector [8]) := 0 ; 
init (MsgVector [9]) := 0 ; 
init (MsgVector [10]) := 0 ; 


next (RoundNum) := 
case 

(RoundNum = 3) : 0 ; 

(Global_Clock = MyP) & (SourceNode = Nodejd) : 1 ; 

!(SourceNode = Nodejd) & (Count_Sync > 0) : 2 ; 

- If at least F Relays and one Sync, from the source of course, then Round = 3. 
((CountJRelay + MsgVector [SourceNode]) >= FPIusOne) : 3 ; 

- Or, alternatively, the following will do. 

(Count_Relay + Count_MyVector >= FPIusOne) : 3 ; 

1 : RoundNum ; 
esac ; 


next (MessageOut) := 
case 

(Global_Clock = MyP) & (SourceNode = Nodejd) : Sync ; 
(Count_Sync > 0) : Relay ; 

1 : NONE ; 
esac ; 
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next (MsgVector [1]) := 
case 

(N1_Msg = Sync) | (N1_Msg = Relay) : 1 ; 

1 : MsgVector [1] ; 
esac ; 

next (MsgVector [2]) := 
case 

(N2_Msg = Sync) I (N2_Msg = Relay) : 1 ; 

1 : MsgVector [2] ; 
esac ; 

next (MsgVector [3]) := 
case 

(N3_Msg = Sync) | (N3_Msg = Relay) : 1 ; 

1 : MsgVector [3] ; 
esac ; 

next (MsgVector [4]) := 
case 

(N4_Msg = Sync) | (N4_Msg = Relay) : 1 ; 

1 : MsgVector [4] ; 
esac ; 

next (MsgVector [5]) := 
case 

(N5_Msg = Sync) | (N5_Msg = Relay) : 1 ; 

1 : MsgVector [5] ; 
esac ; 

next (MsgVector [6]) := 
case 

(N6_Msg = Sync) | (N6_Msg = Relay) : 1 ; 

1 : MsgVector [6] ; 
esac ; 

next (MsgVector [7]) := 
case 

(N7_Msg = Sync) I (N7_Msg = Relay) : 1 ; 

1 : MsgVector [7] ; 
esac ; 

next (MsgVector [8]) := 
case 

(N8_Msg = Sync) | (N8_Msg = Relay) : 1 ; 

1 : MsgVector [8] ; 
esac ; 

next (MsgVector [9]) := 
case 

(N9_Msg = Sync) | (N9_Msg = Relay) : 1 ; 

1 : MsgVector [9] ; 
esac ; 

next (MsgVector [10]) := 
case 

(N10_Msg = Sync) | (N10_Msg = Relay) : 1 ; 

1 : MsgVector [1 0] ; 
esac ; 


-- end of Node 
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